Cross Language Information Retrieval: an Experiment in Bilingual News Article Alignment from the Internet using MT

نویسندگان

  • Nigel Collier
  • Hideki Hirakawa
  • Akira Kumano
چکیده

Cross Language Information Retrieval (CLIR) o ers the potential for users to search document collections in foreign languages. This is particularly relevant now that the Internet has become a global information source. Machine translation (MT) has a key role in bridging the gap between the language of the users' query and that of the document collection as well as to help the user understand the search results with gisting. In this paper we reformulate the CLIR task as text alignment on a database of Reuter news articles. We show preliminary results for CLIR using relevance feedback with machine-translated queries from Japanese into English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Similar are Chinese and Japanese for Cross-Language Information Retrieval?

For NTCIR Workshop 5 UC Berkeley participated in the bilingual task of the CLIR track. Our focus was on Chinese topic searches against the Japanese News document collection, and on Japanese topic search against the Chinese News Document Collection. Extending our work of NTCIR 4 workshop, we performed search experiments to segment and use Chinese search topics directly as if they were Japanese t...

متن کامل

Machine Translation versus Dictionary Term Translation - A Comparison for English-Japanese News Article Alignment

Bilingual news article alignment methods based on multilingual information retrieval have been shown to be successful for the automatic production of so-called noisy-parallel corpora. In this paper we compare the use of machine translation (MT) to the commonly used dictionary term lookup (DTL) method for Reuter news article alignment in English and Japanese. The results show the trade-off betwe...

متن کامل

Cross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆

How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....

متن کامل

The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term...

متن کامل

Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages

This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing six different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000